Add dataset collection #253

Yunnglin · 2024-12-18T10:04:39Z

注册数据集：
- race
- trivia_qa
- truthful_qa 不支持service
- mmlu
- mmlu_pro
- humaneval
- general_qa
- cmmlu
- arc
- hellaswag 不支持 service
- bbh
- ceval
- gsm8k
- competition_math
支持指定URL模型服务评测
添加benchmark贡献指南，添加数据混合评测指南
支持数据混合评测，自定义collection schema

schema = CollectionSchema(name='math&reasoning', datasets=[
            CollectionSchema(name='math', datasets=[
                    DatasetInfo(name='gsm8k', weight=1, task_type='math', tags=['en', 'math']),
                    DatasetInfo(name='competition_math', weight=1, task_type='math', tags=['en', 'math']),
                    DatasetInfo(name='cmmlu', weight=2, task_type='math', tags=['zh', 'math'], args={'subset_list': ['college_mathematics', 'high_school_mathematics']}),
                    DatasetInfo(name='ceval', weight=3, task_type='math', tags=['zh', 'math'], args={'subset_list': ['advanced_mathematics', 'high_school_mathematics', 'discrete_mathematics', 'middle_school_mathematics']}),
            ]),
            CollectionSchema(name='reasoning', datasets=[
                    DatasetInfo(name='arc', weight=1, task_type='reasoning', tags=['en', 'reasoning']),
                    DatasetInfo(name='ceval', weight=1, task_type='reasoning', tags=['zh', 'reasoning'], args={'subset_list': ['logic']}),
                    DatasetInfo(name='race', weight=1, task_type='reasoning', tags=['en', 'reasoning']),
            ]),
        ])



task_cfg = TaskConfig(
    model='qwen2.5',
    api_url='http://127.0.0.1:8801/v1/chat/completions',
    api_key='EMPTY',
    eval_type=EvalType.SERVICE,
    datasets=['data_collection'],
    dataset_args={'data_collection': {
        'local_path': 'outputs/mixed_data_test.jsonl'
    }},
)
run_task(task_cfg=task_cfg)

输出：

task_type	dataset_name	subset_name	average_score	count
math	ceval	advanced_mathematics	0.25	12
math	ceval	discrete_mathematics	0.333333	3
math	ceval	high_school_mathematics	0	3
math	ceval	middle_school_mathematics	0	3
math	cmmlu	college_mathematics	0.2	5
math	cmmlu	high_school_mathematics	0.555556	9
math	competition_math	default	0	7
math	gsm8k	main	0.428571	7
reasoning	arc	ARC-Challenge	0.166667	6
reasoning	arc	ARC-Easy	0.5	10
reasoning	ceval	logic	0.25	16
reasoning	race	high	0.285714	14
reasoning	race	middle	0.8	5

docs/zh/advanced_guides/collection/schema.md

docs/zh/advanced_guides/collection/sample.md

evalscope/collections/data_generator.py

docs/en/advanced_guides/collection/schema.md

Yunnglin added 15 commits December 18, 2024 18:03

add dataset register

f3c09da

fix circular import

db7f37c

fix lint

1168796

Merge branch 'main' into feat/collection

376afc8

update data adapter

655d49c

update model adapter

2f941a6

split model adapter

a3b9b9f

add server

85b6577

update seed and ceval

78ce442

init collection

4b07449

add collection and sampler

f67322a

remove output

ebcc800

add mix evaluator

0c4e87d

add evaluator

b957f83

register all data

95aa741

Yunnglin changed the title ~~[WIP] Add dataset collection~~ Add dataset collection Dec 24, 2024

Yunnglin added 13 commits December 24, 2024 18:43

update test

1ea478c

merge main

08240b1

add multi level log

fc5574d

add multi level log

aad5561

update parameter

c14c5bc

update dataset

28c09e5

add collection doc

0859738

add benchmark guide

c10595f

update doc

7c572d0

add exception handler

5709748

update import

50b5c85

update doc

50f5a10

update chat template

1bc459d

wangxingjun778 reviewed Jan 3, 2025

View reviewed changes

docs/en/advanced_guides/collection/schema.md Show resolved Hide resolved

update

e0fc9b4

wangxingjun778 merged commit 3baaf24 into main Jan 3, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset collection #253

Add dataset collection #253

Yunnglin commented Dec 18, 2024 •

edited

Loading

Add dataset collection #253

Add dataset collection #253

Conversation

Yunnglin commented Dec 18, 2024 • edited Loading

Yunnglin commented Dec 18, 2024 •

edited

Loading